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ABSTRACT 

The  Bayesian  decision  method  has  several  features  which 
are  desireable  in  the  sampling  inspection  process  for  quality 
control.   These  features  include:  (1)  comparison  of  the  value 
of  sample  information  in  the  decision  process  with  the  cost 
of  obtaining  the  information;  (2)  basingdecisions  on  their 
consequences  to  the  decision  maker;  and  (3)  allowing  the  use 
of  subjective  information  in  the  decision  process.   In  this 
paper  the  Bayesian  decision  procedure  as  it  applies  to 
variables  sampling  for  quality  control  is  examined.   The 
basic  method  is  developed  for  both  simultaneous  and  sequential 
sampling  and  the  modeling  of  decision  consequences  is  dis- 
cussed.  Various  models  for  the  production  process  are 
provided  and  solutions  for  the  generalized  linear  model 
obtained.   Finally  the  incorporation  of  subjective  information 
is  discussed. 
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I.   INTRODUCTION 


One  of  the  primary  concerns  in  the  quality  control  of 
items  produced  by  or  received  from  a  production  line  are  the 
procedures  by  which  decisions  are  made  concerning  the  quality 
of  the  material  produced.   In  order  to  develop  a  decision 
procedure  the  abstract  term  "quality"  must  be  operationally 
defined.   This  definition  usually  takes  the  form  of  an  equip- 
ment specification  which  lists  the  characteristics  required 
of  the  unit  to  perform  its  intended  function.   In  the  case  of 
100%  inspection  eyery    unit  produced  or  received  is  subjected 
to  a  test  in  which  these  characteristics  are  measured.   Based 
on  the  test  results  the  decision  procedure  is  to  accept  those 
units  which  satisfy  the  specification  and  reject  those  which 
do  not.   When  sampling  inspection  is  used,  a  sample  of  the 
production  output  is  tested.   The  test  results  of  the  sample 
are  then  used  to  make  and  accept  or  reject  decisions  con- 
cerning the  population  or  lot  from  which  the  sample  was  drawn 

The  decision  procedure  in  this  case  requires  that  a 
decision  function  be  specified  which  indicates,  for  the  test 
results  observed,  the  decision  to  be  made  (accept  or  reject). 
Unlike  100%  inspection,  in  sample  testing  there  always  exists 
the  possibility  that  the  decision  function  may  indicate  an 
erroneous  decision. 


Th_e  consequences  of  erroneous  decisions  represent  a 
loss  to  the  decision  maker  and  can  range  from  mild  to  severe. 
As  an  example  suppose  a  machine  is  judged  to  be  out  of 
calibration  when  in  fact  it  is  not.   Then  the  loss  to  the 
decision  maker  would  be  the  cost  of  a  needless  recal i brati on 
whtch  may  be  small.   On  the  other  hand  if  production  quality 
were  judged  to  be  acceptable  when  in  fact  it  was  not, 
consumers  might  seek  alternate  sources  of  supply.   This  could 
result  in  the  loss  of  entire  production  contracts  and  reputa- 
tion.  Although  the  above  examples  are  oversimplified  the 
main  idea  is  that  erroneous  decisions  always  represent  a  loss 
to  the  decision  maker.   Thus  in  the  design  of  a  decision 
procedure  the  loss  due  to  erroneous  decisions  must  be 
consi  dered . 

Another  consideration  in  the  design  of  a  decision 
procedure  is  the  cost  of  testing  the  sample  units.   This 
cost  includes  the  labor  and  test  facilities  required  and  may 
include  the  cost  of  the  units  themselves  (if  the  tests  are 
destructive)  or  repair  costs  if  the  units  fail.   These  costs 
may  be  large  if  complex  facilities  are  required  or  test  time 
is  long.   If  the  cost  of  testing  is  larger  than  the  antici- 
pated consequences  of  a  decision  then  the  cost  of  information 
is  greater  than  its  value  in  the  decision  process.   Under 
these  circumstances,  gathering  further  information  (testing) 
is  counterproductive.   Thus  a  decision  procedure  should 
indicate  the  "value1  of  additional  information  to  the  decision 
maker. 


The  most  common  method  of  specifying  a  decision 
procedure  is  based  on  classical  hypothesis  testing.   In  this 
approach  two  points  on  the  operating  characteristic  (OC) 
curve  for  the  decision  procedure  are  specified.   The  OC  curve 
for  the  procedure  is  the  probability  of  acceptance  versus 
equipment  quality.   The  required  sample  size  and  reject/accept 
crfterfa  are  then  developed  based  on  the  sampling  distribution 
using  a  likelihood  ratio  test.   Another  method  which  provides 
greater  flexibility  and  has  features  absent  in  the  classical 
method  is  the  Bayesian  decision  approach.   In  the  Bayesian 
method  the  decision  procedure  is  optimized  for  a  loss  function 
specified  by  the  user  which  reflects  this  particular  applica- 
tion.  If  the  loss  function  and  the  cost  of  testing  are 
expressed  in  the  same  units  the  cost  of  information  can  be 
obtained.   The  Bayesian  method  also  contains  the  classical 
procedure  as  a  special  case.   Another  feature  of  the  Bayesian 
method  is  that  specific  knowledge  of  the  behavior  of  the 
production  process  as  well  as  subjective  information  can  be 
incorporated  thus  allowing  the  decision  procedure  to  adapt  to 
changing  requirements. 

In  the  following  sections  the  basic  Bayesian  decision 
procedure  will  be  outlined  and  the  specification  of  loss 
functions  and  models  for  the  production  process  discussed. 
The  generalized  linear  model  is  introduced  and  the  recursion 
equations  developed  to  facilitate  calculation  of  posterior 
distributions.   Finally,  the  incorporation  of  subjective 
information  is  discussed. 
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II.   THE  BAYESIAN  METHOD 


A.    THE  GENERALIZED  BAYESIAN  DECISION  PROCEDURE 

In  order  to  discuss  the  Bayesian  method  as  applied  to  a 
production  line,  a  generalized  model  of  the  production  and 
sample  test  process  is  required.   Let  e  represent  the  charac- 
teristic of  the  equipments  upon  which  decisions  are  to  be 
made.   For  example  9  would  be  the  average  or  mean  gain  of  a 
production  lot  of  amplifiers.   The  actual  value  of  3  is  not 
observable,  however,  we  can  perform  tests  which  indicate  the 
gain  of  an  individual  amplifier.   Let  x  indicate  the  results 
of  such  test.   Also  let  e  and  x  be  related  through  a  known 
probability  density  function  denoted  by  f(x)|e).   As  a  model 
of  the  generalized  production  process  we  assume  a  random 
process  such  that  for  each  time  t,  9   has  continuous  distri- 

T 

bution.   It  is  also  assumed  that  there  exists  a  time  increment 


At>0  for  which  9  is  constant  i.e., 


for  al 1  t 


t    "t  +  Ax 
This  assumption  implies  that  given  a  production  increment  of  n 

items  produced  during  At,  test  results  for  each  unit  are 

samples  from  f(x|et).   Figure  1  shows  how  9  might  vary  for  the 

generalized  process.   As  shown  in  the  figure,  9  for  the 

increment  At  is  a  fixed  but  unknown  quantity.   It  is  the  units 

produced  during  each  At  which  are  the  object  of  the  decision 

process.   At  each  end  point  t ,  ,  t" « »  ••♦  a  decision  must  be 

made  to  either  accept  or  reject  the  units  produced  in  the 
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preceding  time  increment.   The  decisions  will  be  based  on  the 

sample  data  (x)  which  provides  information  concerning  the 

true  value  of  e.   At  time  tQ  let  the  probability  density 

f(s  )  represent  the  uncertainty  as  to  the  true  value  of  9  . 
o  o 

Since  the  value  of  9  at  time  t  may  or  may  not  be  independent 
of  the  previous  values  of  e  ( e .  _  -.  ,  Q.    ~    ...)  let  the  uncer- 
tainty of  9.  given  (e.  ,,  9._2  ...)  be  expressed  by  means  of 
the  conditional  density  f(9.|9._-,,  9._~  •••)  which  is  known 
or  can  be  estimated  but  not  necessarily  the  same  for  all  t. 
Since  the  decisions  (accept  or  reject)  are  to  be  based  on  the 
characteristic  9,  it  is  necessary  to  specify  a  loss  function 
which  indicates  the  consequences  of  a  particular  decision 
when  a  specific  value  of  9  obtains,  for  all  possible  values 
of  9  and  all  decisions.   Let  e  represent  the  set  of  all 
possible  values  of  9  and  let  D  represent  the  set  of  all 
decisions  d.   Then  let  L(9,d)  be  the  loss  function  which  is 
known  for  all  deD  and  9e0.   The  loss  for  a  particular  decision 
deD  depends  on  the  actual  value  of  9,  but  9  is  unknown.   The 
expected  loss  of  decision  d  would  be  the  product  of  L(9,d) 
times  the  probability  that  9  obtains,  summed  over  all  possible 
values  of  9.   Let  p(d)  denote  the  expected  loss  or  risk  of 
decision  d  then: 


p(d)  =  J     L(9,d)  f(9)d 

0 


(1) 
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The  optimal  decision  in  terms  of  minimizing  the  risk  is  the 
decision  deD  which  minimizes  equation  (1)  and  is  denoted  by 
d*  therefore: 


p(d*)  =  min  f  L(e,d)  f(e)de  . 


(2) 


Thus  given  a  loss  function  L(e,d)  and  the  distribution  of 
9,  f(e),  the  optimal  decision  is  defined  by  equation  (2). 
The  optimal  decision  (d*)  is  usually  referred  to  as  the  Bayes 
decision  against  f(s).   Since  the  decisions  (accept  or  reject) 
wtll  be  based  on  the  sample  data  (x_:[x-|,x2,  .  ..,  xnl)  it  is 
desirable  to  specify  a  decision  function,  denoted  by  S(x_), 
which  for  eyery    value  of  x_  observed,  specifies  the  decision 
d*,  i.e.,  the  decision  which  minimizes  the  risk.   Let  S   be 
the  set  of  all  possible  test  results.   Then  the  risk  of  the 
decision  function  6(x_)  is  by  equation  (1) 


(«(x))  =  J  J     L(9,5(x))  f(x|e)  f(9)dxd 


(3) 


0  S 


We  want  to  find  the  decision  function  6*(x_)  which  minimized 
the  risk  as  expressed  in  equation  3.   Assuming  L(9,6(x_))is 
bounded,  interchanging  the  order  of  integration  yields: 


p(«(x))  =  J   [  f  L(e,5(x))f(x|e)f(e)de" 


s   0 
n 


dx 


(4) 


Equation  (4)  is  minimized  when  the  expression  in  brackets  is 

minimized  for  each  value  of  xeS  .   Thus  the  optimal  decision 

—  n 

function  5*(x)  would  specify  a  decision  d*  which  minimized 

the  integral 
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J  L(e,d)f(x|e)f(8)de  . 


(5) 


From  Bayes  theorem  f(e|x)  =  f^|(xf  ^  (6) 

where  f(x)  =    f  (x_|  9  )  f  (e  )de  is  a  constant  given  a  sample 

0 

value  for  x_.   Then  minimizing  the  integral  of  (5)  is  equiva- 
lent to  selecting  a  decision  d*  which  minimizes: 


/Ue.d)  fU[?|t(9)  de  =  /i_(e,d)f(e|x)d 


(7) 


for  each  value  of  x_  observed.   Thus  it  is  not  necessary  to 
determine  in  advance  a  decision  function  5(x_)  which  specifies 
d*  for  all  possible  values  of  x_.   As  each  x_  is  observed  the 
posterier,  f(e|x_),  is  calculated  from  the  prior,  f(e),  by 
equation  (6)  and  d*  is  chosen  to  satisfy 


P(d*)  =  min  f  L(e ,d)f ( e | x)de 
drn  J 


(8) 


This  is  the  same  result  as  equation  (2)  except  that  the 
posterior  based  on  the  test  data  x_  is  used  instead  of  the 
prior.   Thus  equation  (2)  defines  the  optimal  decision  d* 
before  and  after  sampling  as  long  as  the  appropriate  value  for 
f ( e )  is  used. 

The  next  step  is  to  determine  whether  the  decision  should 
be  made  without  sampling  based  on  the  prior  distribution, 
f(e),  or  the  sample  tested  and  the  decision  based  on  the 
posterior  distribution  f(e|x_).   In  order  to  determine  which 
actfon  is  optimal  the  risk  of  obtaining  an  additional  sample 


13 


and  then  proceeding  in  an  optimal  fashion  must  be  obtained. 
If  the  risk  of  selecting  a  decision  immediately  is  greater 
than  the  risk  of  obtaining  a  sample  result  and  then  proceeding 
in  an  optimal  manner,  then  the  sample  should  be  tested  since 
this  is  the  minimum  risk  action.   Let  pU,x_)  denote  the  risk 
of  obtaining  a  sample  x_  when  the  prior  of  8  is  $    and  then 
proceeding  in  an  optimal  manner.   Also  let  p(<|>,d*)  denote  the 
risk  of  making  decision  d  *  when  the  prior  of  e  is  <j> .   Then 
the  following  decision  rule  will  be  used. 

If  p(<j>,d*)  >  p($,x)   test  the  sample;  (9) 

otherwise,  make  decision  d*.   p(<j>,d*)  is  obtained  from 
equation  (2)  where  $  =  f(e)  is: 


pU.d*)  =  min  f  L(  e  ,d)  f  (  e  )de  . 


(10) 


To  determine  p(<j>,x)  two  cases  must  be  distinguished;  the 
samples  are  tested  simutaneous ly  or  sequential  sampling  is 
used . 

1  .    Simultaneous  Sampling 

It  has  been  assumed  that  the  cost  of  obtaining 
sample  results  is  not  zero.   Therefore  let  C  Ax)    represent  the 
cost  of  testing  the  units  1,  2,  ...,  n  where  test  results  are 
represented  by  the  vector  x_  =  (x,,  x~,  ...,  x  ).   In  many 
cases  the  cost  of  testing  is  independent  of  the  values 
obtained,  in  which  case  the  cost  would  be  just  C,  the  cost  of 
testing  n  units.   The  expected  loss  of  testing  n  units  and 
then  making  the  optimal  decision  d  *  plus  the  cost  of  testing 
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is   p(<j),x).      The   distribution   of   e   if  x   is    observed  will    be 
f(e|x)    thus    from   equation    (10): 

PU,x)    =     f  fmin    f  L(  e  ,d)  f  (  e  |  x)del    f(x)dx   +   E[Cn(x)]      (11) 

s    L^D  j9  J 

n 

Thus  the  decision  procedure  for  the  production  lot  would  be 
as  follows: 

1.  Prior  to  testing  determine  p(<j>,d*)  from  equation 
(10)  based  on  the  prior  f(e). 

2.  Determine  the  risk  of  testing,  p(<f>,x),  from 
equati  on  (11). 

3.  If  p(<j>,d*)  <  p(<j»,x_)  make  decision  d*  otherwise. 

4.  Test  sample  units  to  obtain  data  x_  and  make 
decision  d*  according  to  equation  (8). 

2 .    Sequential  Sampling 

The  risk  of  sampling  in  a  sequential  procedure 
differs  from  the  simultaneous  case  because  after  a  sample  is 
tested  two  actions  are  possible,  (a)  make  a  decision  or  (b) 
continue  sampling.   Because  of  this,  determining  the  risk  of 
a  sequential  procedure  is,  in  general,  more  difficult  than 
determining  the  risk  of  the  simultaneous  case  just  discussed. 
In  most  cases  of  practical  interest  the  sample  size  has  a 
fixed  upper  bound.   Let  n  denote  the  maximum  number  of  samples 
available  for  testing.   Then  the  risk  of  testing  the  first 
unit  and  proceeding  in  an  optimal  fashion  is  the  risk  of  the 
n  step  sampling  procedure  where  after  each  sample  is  tested 
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the  risk  of  continuing  is  compared  with  the  risk  of  choosing 

a  decision  and  the  minimum  of  the  two  risks  is  chosen.   After 

the  first  sample  is  taken,  the  risk  of  choosing  a  decision 

must  be  compared  to  the  risk  of  continuing  with  the  n-1  step 

sampling  procedure.   Thus  as  each  sample  is  drawn  the  risk  of 

continuing  changes  due  to  the  change  in  the  sample  number 

remaining  as  well  as  the  new  prior  based  on  the  samples 

observed.   The  above  process  may  be  viewed  as  a  decision  tree 

shown  in  figure  2  which  depicts  the  sequential  decision 

process  for  n=4.   At  each  step  (k)  k  =  0,  1,  2,  ...,  n  the 

risk  of  making  an  immediate  decision  is  denoted  by  p(.4>k,d*) 

where  <j>,  is  the  distribution  of  8  based  on  k  samples  and  is 

defined  in  equation  (10).   At  each  step  this  must  be  compared 

with  the  risk  of  continuing  the  sequential  test  process  and 

the  minimum  risk  action  chosen.   The  risk  of  continuing  at 

each  step  is  denoted  by  p  Uk,xk  +  1  )  where  xk  +  1  =  [xk+1  ,xk+2 , 

...,  x„  i ,  x  ]  indicating  the  dependence  on  the  current  prior 
n- 1    n  J 

and  the  remaining  samples. 

The  difficulty  alluded  to  earlier  is  in  obtaining 
values  for  p  (<j>.  >x_i<  +  i  )  .   The  general  solution  procedure  uses 
a  backward  induction  starting  at  the  last  step  and  working 
backward  to  the  first  step  to  obtain  the  continuation  risk  at 
each  step.   For  the  n=4  case  depicted  in  Figure  2  the  proce- 
dure would  be  as  follows.   At  step  3  after  three  samples  had 
been  observed  the  optimal  action  would  be  the  minimum  of 
PU3,d*)  and  p(<J>3,x4).   Where  p(<j>3,d*)  is  the  risk  of  the 
optimal  decision  given  <(>-  as  defined  in  equation  (10)  and 
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p(*3»-4^  1S  tlie  risk  °^   obtaining  an  additional  sample  x, 
and  then  making  the  optimal  decision.   Since  a  decision  must 
be  made  after  x*  is  observed  this  is  the  same  as  the  sampling 
risk  for  simultaneous  sampling  defined  in  equation  (11)  with 
x_  equal  to  x * .   Thus  the  risk  at  step  3  is  a  function  of  4>, 


denoted  by  p($3)  and  would  be 


p(<j>3)  =  min  [pU3,d*)>  pU3,x.4)]. 


(12) 


The  risk  at  step  2  will  be  the  minimum  of  the  decision  risk 
p(<j>2>d*)  ar,d  the  continuation  risk  p(^2,x_3).   The  continuation 
risk  p((j>o>x_3)  is  the  expected  value  of  the  risk  at  step  3 
based  on  the  sample,  x3#   Thus, 

pU2,x3)  =  E[p  (<j>3(x3) )]  =  (-,  3j 

f    min[P(c()3(x3)sd*)  ,  p  U3(x3  )  ,x_4)]f  (x3)dx3  +  C3 


where:   <f>3(x~)  is  <j>«  given  x3  i  .e.  f  2  (  8  |  x3  ) 


fCx3)  =  Jf(x3|e)f2(9)de 

o 

C3  =  expected  cost  of  obtaining  value  x3 

Thus  at  step  2,  the  optimal  action  again  being  that  with 
minimal  risk,  the  risk  is 

p(<j>2)  =  min[P  (<j>2,d*)  ,  pU2,x_3)]. 


(14) 
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This    approach    is    then    repeated    for   step    1    which   yields 
pU-I  ,x2)    =    E[pU2(x2)  )]    = 

=     J    min    [p(<j»2(x2)  ,d*)  ,    p  ( <t>2  (x2  )  ]f  (x2  )  dx2   +    C. 


(15) 


and 


pU-j)    =   min[p(<j>1  ,d*),    pU-j.Xg)]    . 


(16) 


Thus  at  step  0,  the  beginning  of  the  procedure,  the  risk  of 
the  entire  sequential  test  procedure  would  be 


pUq)  =  min[P(<j>0,d*),  pUq.Xj)] 


(17) 


where   pUq,x-|)  =  E[p  U-j  (x,  ) )  ]  is  the  risk  of  the  entire 
sequential  sampling  plan. 

As  seen  from  the  above  discussion  determining  the 
risk  of  a  sequential  procedure  is  a  non-trivial  exercise. 
The  degree  of  difficulty  depending  on  the  sample  size  n,  the 
loss  function  L(e,d)  and  the  sampling  and  parameter  distribu- 
tions, f (x  |  e )  and  f(e).   Examples  of  the  above  procedure  for 
sequential  testing  may  be  found  in  the  open  literature 
[1  ,  2  and  3]. 

3 .    Comparison  of  Sequential  Versus  Simultaneous  Sampling 

In  the  design  of  a  quality  control  procedure  the 
method  of  sampling  must  be  specified.   In  order  to  determine 
which  of  the  two  methods  is  preferred  in  a  given  situation 
the  risks  of  the  two  procedures  should  be  compared.   In  lieu 
of  mitigating  circumstances  such  as  ease  of  implementation, 
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increased  complexity,  etc.,  the  decision  as  to  which  sampling 
scheme,  sequential  or  simultaneous,  is  optimal  should  be 
based  on  the  risks  of  the  two  procedures.   That  is  if  p(s*) 
is  the  risk  of  the  optimal  sampling  scheme  s*  then 


p(s*)  =  min[p(sp)  ,  P(si  )]  . 


(18) 


where   sQ  =  sequential  sampling 

s.  =  simultaneous  sampling 

In  many  cases  the  decision  as  to  which  sampling 
scheme  is  superior  is  obvious  due  to  the  nature  of  the  test. 
For  example,  if  the  cost  of  testing  is  constant  regardless  of 
the  number  of  units  tested  then  simultaneous  testing  would 
provide  minimum  risk.   If  however  the  cost  of  testing  were 
only  a  function  of  the  number  of  units  tested  then  sequential 
testing  would  be  superior.   It  is  when  the  cost  of  testing 
assumes  some  combination  of  the  two  extremes  that  the  optimal 
choice  becomes  unclear,  in  which  case  the  risks  of  each 
procedure  must  be  compared  to  determine  the  optimal  approach. 

B.    APPLICATION  TO  THE  GENERALIZED  PRODUCTION  PROCESS 

In  order  to  gain  insight  to  the  use  and  requirements  of 

the  Bayesian  decision  method,  the  implementation  of  the 

procedure  on  the  generalized  process  of  Figure  1  will  now  be 

discussed  for  the  simultaneous  sampling  case. 

At  time  tQ,  t-,  ,  t«,  ...  the  following  assumptions  are 

made . 
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1 . 

2. 

3. 

4. 

5. 


A  production  sample  of  size  n.  from  production 
lot  Qt  is  available  for  testing. 
Samples  are  independent  given  e.  and  the  sampling 
density  f ( x | e . )  is  known. 

The  cost  of  testing  the  n  units,  C  ,  is  known. 
A  loss  function  L(e,d)  is  specified  for  all  deD 
and  9e9. 

At  time  t  ,  prior  to  sampling,  the  distribution 
of  9  is  known  and  denoted  by  f(9n). 
6.    The  conditional  distribution  of  e.+-,  given 

9t'  9t-l'  •'*  ]  s  known  anci  denoted  by  f(  e.,-,  |  9t ) 
where  a  markov  dependence  is  assumed  for 
i 1 1 ustrati  on . 
At  time  t  ,  prior  to  sampling,  two  actions  are  feasible 
Either  make  a  decision  (accept  or  reject)  or  test  the  sample 
to  gain  information.   If  a  decision  is  made  without  sampling 
the  risk  will  be  p(d*)  as  defined  by  equation  (8).   The  risk 
of  testing  the  sample  x  =  (x-pX^,  ...,  x  )  and  making  the 
optimal  decision,  p(<j>,x)  is  given  by  equation  (11).   Assume 
that  sampling  represents  the  minimal  risk  action.   After  the 
sample  result  is  obtained,  the  prior  f(8Q)  must  be  revised 
and  the  optimal  decision  d  *  chosen.   Denote  the  posterior  or 
new  prior  based  on  the  data  sample  by  f(9Q|x_),  then  by  Bayes 
theorem 

f(xj90)f(90) 


f(9Q|x)  = 


TTxT 
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The  posterior  f(eQ|x)  ts  now  used  to  determine  the  optimal 
decision  d*,  by  equation  (10) 


P(d*)  =  min  f  L(e,d)  f(eQ  |x)  d9Q 
H  f  n  J 


After  the  decision  is  made  on  lot  QQ  at  tn  the  procedure  steps 
to  lot  Q-.  at  t-,  .   In  order  to  determine  the  appropriate 
actions  concerning  this  lot  the  distribution  f  (  9  ->  )  must  be 
obtained.   It  is  at  this  point  where  the  model  of  the  produc- 
tion process  is  used.   The  relationship  between  eQ    and  e, 
must  be  known  in  order  to  determine  the  density  f(e,)  based 
on  the  posterier  f(8Q|x.).   The  relationship  between  9~  and  e, 
is  specified  by  the  conditional  density  f(9-i|6Q)  which  is 
obtained  from  the  model  of  the  production  process.   Methods 
by  which  this  density  may  be  obtained  from  the  production 
model  are  discussed  in  a  later  section.   Given  that  f(9-,|e0) 
is  known  then  f(e,)  prior  to  sampling  from  lot  Q,  is  obtained 
as  follows: 


f(e1 )  =  f  f(e1 |e0)  f(e0|x)de 


(19) 


Using  this  value  as  the  prior  for  9,;  p(d*)  and  p(<j),x) 
are  obtained  using  equations  (10)  and  (11)  as  before  and  the 
decision  rule  (9)  is  applied.   If  the  decision  rule  indicates 
that  the  risk  can  be  lowered  by  sampling,  the  procedure  as 
outlined  for  9Q  is  followed.   If  however,  no  sampling  is 
required  then  decision  d*  is  made  and  the  procedure  advances 
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to  t«.   At  t2>  f(9p)  niust  be  obtained  based  on  f(e«|x_)  since 
no  samples  were  observed  at  time  t, .   The  density  f(e?)  would 
be  determined  as  follows: 


f(e2)  =  Jf(e2|e0)  f(eQ|x)  de 


(20) 


where   -F  ( e  2  |  e  0 )  =  ff(e2|e-j)  f(e-j|e0)  de,  . 

0 

After  f(e«)  is  determined  p(d*)  and  p($,x)  are  obtained  as 
before  and  the  decision  rule  applied.   The  entire  procedure 
is  then  repeated  to  determine  if  samples  should  be  tested  at 

3  *   4 '  •••'  6 1 c . 

Under  the  decision  process  described  it  may  be  possible 
that  no  sampling  would  be  required  for  several  production 
lots.   At  first  thought  this  may  seem  contrary  to  the  objective 
of  minimizing  the  decision  risk.   If  the  production  sequence 
BQ,  9-j,  92  ...  is  highly  correlated  then  knowledge  of  one 
value  of  9  implies  considerable  knowledge  of  succeeding  (and 
preceding)  values.   The  correlation  is  expressed  by  the  density 
f(9.|e.-|)  which  is  derived  from  the  model  of  the  production 
process.   The  decision  process  thus  quantifies  the  feeling 
"When  one  lot  is  good  the  next  one  usually  is  good  also." 

In  order  to  apply  the  Bayesian  method  in  loss  function, 
L(e,d),  and  the  sampling  and  process  densities,  f(x|e)  and 
f(et|et_-|)  must  be  specified.   These  are  the  subject  of  the 
following  sections. 
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III.   LOSS  FUNCTIONS 

The  purpose  of  this  section  is  to  examine  various  ways 
in  which  the  consequences  of  decisions  can  be  related  to  the 
true  value  of  quality.   In  the  preceding  section  this  rela- 
tionship was  generally  referred  to  as  a  loss  function,  L(e,d). 
As  mentioned  previously  the  loss  when  the  best  decision  is 
made  for  a  given  value  of  e  is  equal  to  zero.   The  loss  of  a 
particular  decision  d  when  e  =  9  is  the  difference  between 
the  consequences  if  d  is  chosen  and  the  consequences  if  the 
best  decision  were  chosen.   The  loss  then  essentially  repre- 
sents a  regret  or  opportunity  cost.   From  the  above  definition 
it  is  seen  that  one  characteristic  of  loss  functions  is  that 
they  are  non-negative.   Since  in  the  decision  process  the 
risk  of  sampling  is  added  to  the  cost  of  testing  the  loss 
function  and  testing  cost  must  be  expressed  in  similar  units 
(e.g.,  dollars).   In  the  following  examples  it  is  assumed  that 
the  utility  of  money  is  linear  over  the  range  of  interest. 
This  assumption  alleviates  the  otherwise  necessary  transforma- 
tion of  the  loss  in  dollars  to  utility.   If  the  utility  of 
money  is  continuous  then  at  least  to  a  first  order  approxima- 
tion the  linear  assumption  is  valid.   In  the  following 
paragraphs  several  examples  of  loss  functions  are  discussed. 
Their  presences  is  not  meant  to  imply  that  they  are  in  any 
way  the  best  or  most  useful  loss  functions.   The  loss  function 
used  for  a  particular  process  depends  entirely  on  the 
situation  at  hand. 
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A.    SYMMETRIC  LINEAR  LOSS  FUNCTION 

The  following  example  demonstrates  that  if  the  decision 
consequences  are  a  linear  function  of  9  then  the  loss  function 
will  be  linear  and  symmetric  about  their  intersection.   Let 
R(e,d.)  represent  the  consequences  of  decision  d.  when  e 
obtains.   For  the  linear  case 


R(e  ,d,  )  =  a,  9  +  b.j 


(21) 


R(e,d«)  =  a29  +  b«   ,  a2  >  a-. 


(22) 


Equations  (21)  and  (22)  are  plotted  in  figure  3. 

If  the  consequences  are  viewed  as  a  cost  then  the  best 
decision  is  that  which  minimizes  R(e,d)  for  all  values  of  8 
Thus  for  9  <_  80  decision  dp  is  best  and  d-,  is  best  for 


_>9Q.   The  loss  L  (  9  ,  d  -j  )  is 


or 


LCe.d^  = 


L(9,d1)  = 


a,  8 


+  b 


(a29  +  b2) 


(brb2)  ;  ) 


^  90 


9  >  9, 


(a2-a]  )    9  >  9Q 


1  90 


From  (21)  and  (22)  9 


(brb2) 
0    (a2-a]  ) 


Thus 


LCe,d1 )  = 


( 


(a2-a1 ) (e0-e) 


*  °0 
1   90 


(23) 
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Figure  3.   Decision  Consequences 
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By. the  similar  calculations: 


L(e,d2)  = 


i  90 


(24) 


Ca2-a-j)  (e-90)   9  >  eQ 


Equations  (23)  and  (24)  are  shown  in  figure  4 

As  evidenced  by  equations  (23)  and  (24)  and  depicted  in 
figure  4  the  loss  functions  of  linear  consequences  are  linear 
and  symmetric  about  the  intersection. 


B.    QUADRATIC  LOSS  FUNCTIONS 

The  quadratic  loss  function  is  defined  as 

2 


LCe,d1 )  = 


1    o 


o 


9  1  90 


(25) 


L(9,d2)  = 


C2(9Q-9) 


(26) 


9  > 


A  heuristic  justification  for  this  general  form  for  a 

loss  function  can  be  made  as  follows.   Assume  for  a  particular 

problem  that  the  loss  function  L(e,d)  and  all  its  derivitives 

exist.   Let  9n  represent  the  dividing  line  between  acceptable 

and  unacceptable  quality.   If  9  >  9,  let  d,  be  the  proper 

decision  and  if  9  <  9Q  let  d2  be  the  proper  decision.   Thus 

L(e,d-|)  =  0  for  9  >  9Q  and  L(9,d2)  =  0  for  9  <  9Q.   Define 

L(e]  =  max  L(e,d)  then  L(e)  =  L(e,d-,)  for  9  <  9Q  and  L(e)  = 

deD 
L(9,d2)  for  9  >_  9q.   L  C  0 )  can  De  expressed  as  a  Taylor  series 

expenslon  about  e   as  follows: 


27 


CD 


L(  9  ,d-,  ) 


/ 


L(9,d?)         / 
2     / 


/ 


y 


Figure    4.      Linear    Loss    Functions. 
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L"(e0) 


L(e)  =  L(e0)  +  L'  CeQ)Ce-e0)  +  \   u  (e-e0r  +  .. 


(27) 


If  9  =  8Q  then  either  decision  d  -,  or  d  2  is  optimal  and  the 
loss  is  zero.   This  implies  that  L(eQ)  =  0.   If  L(eQ)  is 
zero  then  9Q  is  a  minimum  for  L(e)  which  implies  that 
L'(9Q)  =  0.   Further  if  L(eQ)  is  a  minimum  then  L"(eQ)  must 
be  non-negative. 

Applying  these  results  to  the  Taylor  series  expension 
(27)  yields: 


L(9)  = 


k"(e0) 


(9-9Q)   + 


2  ■  L"'(6o)  (e-eQ)3  + 


Thus  the  loss  function  for  9  close  to  9Q  can  be  approximated 
by: 


Ue.dj)  = 


C1 (e-eQ)    9  < 


o 


l  90 


(28) 


L(e,d2)  = 


1  90 


C2(9-9Q)      9  > 


(29) 


which  is  the  quadratic  loss  function  originally  defined. 
Instead  of  specifying  the  indifference  value  9Q  two  values 
9-.,92  cou^  De  specified.   Where  e,  would  represent  minimum 
acceptable  quality  and  e«  would  represent  the  maximum 
rejectable  quality.   Then  for  d-,  =  accept  and  d«  =  reject  the 
loss  function  could  be  expressed  as  follows. 
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L(e.dn ) 


C,( 


9  <_  9, 


>  9 


L(e,d2) 


c2( 


0 


<_  9p 


>  9 


C.    CONSTANT  LOSS  FUNCTION 

The  constant  loss  function  represents  the  case  where 
the  loss  is  independent  of  the  value  of  9  over  a  specified 
range.   The  constant  loss  function  could  be  represented  as 
fol 1 ows  : 


Ue.d-,) 


1     9  <  9Q 

0       9   >   8n 


(30) 


L(9,d2) 


0     9  < 


(31) 


1  90 


The  expected  loss  or  risk,  p(d),  for  decisions  d,  and 


d2  would  be: 


p(d1)  =   f   1  .  f(e)d 


=  P (9  <  eQ) 


(d2)  =    1  .  f(e)de  =  P (9  >  eQ)   =  a 
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The  risk  for  decision  d,  is  the  probability  that  d« 
is  the  correct  dectston  and  the  risk  for  d2  is  the  probabil- 
ity that  d,  is  the  correct  decision.  p(d,)  and  p(d~)  are 
usually  referred  to  as  the  probability  of  type  II  and  type  I 
errors  respectively  and  denoted  by  6  and  a.   It  can  be 
shown  [1]  that  the  decision  function  6(x)  which  minimize  the 

ftXxJ 

where 


rfsk  has  the  form.   If  ^  -,  -  \  <  C  then  reject. 


V*_) 


f(x|e0) 


f2(x)  =  f(x|e) 


and  9  is  the  maximum  likelihood  estimate  for  9  based  on  x_. 

This  decision  function  is  the  generalized  likelihood 
ratio  criteria  upon  which  classical  hypothesis  testing  is 
based.   Thus  quality  control  procedures  based  on  classical 
hypothesis  testing  imply  that  the  loss  functions  of  (30) 
and  (31)  are  operative. 

This  concludes  the  discussion  of  the  most  common  types 
of  loss  functions.   The  next  section  considers  the  problem 
of  formulating  a  statistical  model  of  the  production  process 
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IV.   THE  PRODUCTION  MODEL 

As  mentioned  previously  in  order  to  apply  the  Bayesian 
decision  method  the  conditional  density  f(9.|e.  -,  )  must  be 
known  or  estimated.   In  order  to  specify  this  density  the 
quality  control  specialist  must  specify  the  underlying 
mechanisms  which  determine  how  the  production  process  is 
evolving  over  time.   In  the  following  models  two  assumptions 
wtl 1  be  made:   (a)  that  the  process  is  determined  by  a 
specific  underlying  relationship  plus  random  disturbances 
and  (b)  that  by  the  central  limit  theorem,  the  ghost  of 
LaPlace  or  some  other  incantation,  the  disturbances  are 
assumed  to  be  normally  distributed  with  known  mean  and 
variance.   In  the  following  paragraphs  three  models  which 
might  be  used  to  characterize  a  production  process  are 
described.   They  are  the  linear  trend  model,  the  autoregres- 
sive  model  and  the  periodic  model.   The  generalized  linear 
model  is  also  presented.   In  addition  to  the  process  models 
a  typical  observation  model  will  be  included  for  completeness 
For  convenience  and  to  aid  in  the  later  development  of  the 
solutions  for  the  generalized  linear  model  it  is  also  assumed 
that  the  observation  errors  are  normally  distributed. 
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A.    LINEAR  TREND  MODEL 

The  linear  trend  model  represents  a  production  process 
where  the  underlying  trend  is  a  linear  increase  or  decrease 
in  the  characteristic  9.   Let  6   represent  the  change  of  9 

T 

from  t-1  to  t  and  x,  represent  the  observations  at  time  t. 
Then  the  observation  and  process  models  could  be  described 
as : 


OBSERVATION:   xt  =  9t  +  e. 


et~N(0,ax') 


PROCESS: 


t  =  9t-l  +  6t  +  wt   wtTN(0'a  > 


(32) 
(33) 


In  the  above  model  e.  represents  observation  noise  and 
w.  represents  the  process  noise  causing  devi ations . from  the 
linear  relationship.   From  the  model,  the  conditional  density 

"P  ( e  1 1  e .  ,)  would  be  normal  with  mean  =  E(e.)  =  9t_-i  +  5t  and 

2 
variance  equal  to  a    .   In  order  to  apply  the  Bayesian  proce- 

2 
dure  the  increment  6.  and  the  variance  a      must  be  known  or 

estimated  from  the  process.   If  the  uncertainty  in  6.  is 

incorporated  in  the  model  the  result  is 


t-1  +  6t  +  wt 


wt~  N(0,cj  ) 


6t  ~  N(yR,aR  ) 


then  f(9.|9._,)  ~  N(9t-1  +  WR'  aR   +  a    ^    assuming  «t  and  wt 
are  uncorrel ated . 
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B.    AUTOREGRESSIVE  MODEL 

The  basic  autoregressi ve  results  from  the  following 
assumptions  about  the  production  process. 


1.    f(et)  ~  N(O.cr')   yt 


2      2 

pa- 


2.    f(6t,8t+1)  -  N(O.C)  Vt,  C  =  (pa2  a2    ) 

Assumption  1  indicates  that  at  any  time  t  the  uncertainty 
in  the  location  of  9.  can  be  expressed  as  a  normal  probability 
density  about  the  overall  process  mean  (assumed  to  be  zero 
in  this  case)  with  process  variance  a      common  for  all  t. 
Assumption  2  indicates  that  the  values  of  9  at  successive 
ttmes  are  not  independent  and  their  joint  density  is 
btvarfate  normal  with  correlation  coefficient  p.   The  observa- 
tion and  process  model  for  the  autoregress i ve  process  can 
be  expressed  as : 


OBSERVATION:   xt  =  9't  +  et 


et  -  N(0,a/) 


PROCESS:   9 


=  Pet_1  +  wt   wt  -  N(0,a2(l-p2)) 


(34) 
(35) 


This  model  is  useful  when  there  appears  to  be  no  under- 
lying trend  either  up  or  down  in  the  process  and  the  quality 
of  successive  lots  appears  to  be  highly  correlated.   From 
the  above  model 

f(9t|9t_1)  -  N(p9t_1,a2(l-p2)) 
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C.    PERIODIC  MODEL 

If  the  production  process  behaves  in  periodic  fashion 
over  some  internal  T  then  a  periodic  model  is  appropriate. 
Let  e(t)  represent  the  periodic  function  which  describes  the 

production  process  and  let  the  average  value  of  the  periodic 

1   fT 
function  be  zero  (i.e.,  T  e(t)dt  =  0).   Then  the  periodic 

1  J0 
function  e(t)  can  be  approximated  by  the  Fourier  series  as 


9(t)  =   z   (abCOSkoa,  t  +  buSINkoo  t)  , 
k=l 


o     k         '   o 


2_L 
T 


where  the  coefficients  a.  and  b,  are  unknown  and  subject  to 
disturbances.   The  value  of  n  being  large  enough  to  make  the 
approximation  valid. 


Let   9   =  (a, a0  ...  Ab, b0 


b  ) 

n  ' 


J  . 


and  B   -  (cosw^t  c  o  s  2  w  t  ...  c  o  s  n  co  t  s  i  n  u>  _  t  ...  s  i  n  n  w  t ) 
v     o       o  oo  o 

Then  the  observation  and  process  models  are: 

OBSERVATION:   xfc  =  B^9t  +  et    et  ~  N(0,ax2) 
PROCESS:   et  =  et_-j  +  wt  wt  -  N(0,z) 

where  0_  i  s  a  2nxl  vector  of  zeros  and  z    =  E[w.w.]. 


(36) 
(37) 


D.    GENERAL  LINEAR  MODEL 

The  above  models  represent  special  cases  of  the  general- 
ized linear  model  incorporating  special  features  to  reflect 
particular  characteristics  of  the  production  process.   The 
generalized  linear  model  is  defined  as  follows: 
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Let  x.  =  nxl  vector  of  observations  at  time  t 

et  =  pxl  vector  of  process  parameters  at  time  t 

A-,  =  nxp  matrix  characterizing  the  observation 

A2  =  px p  matrix  characterizing  the  process 

e.  =  nxl  vector  of  observation  noise  at  time  t 

o)t  =  pxl  vector  of  process  noise  at  time  t 

E[et]  =  E[wt]  =  0 

VAR  (et)  =  E[eteJ]  =  C] 

VAR  (wt)  =  E[wtw[]  =  C2 
Then   OBSERVATION   x_t  =  A]et  +  et    et  -  NtO,^) 


PROCESS    8t  =  A2et_-|  +  wt     w   -  N(0,C2) 


(38) 
(39) 


is  the  generalized  linear  model. 

In  order  to  implement  the  Bayesian  decision  procedure 
various  probability  densities  are  required  to  determine  the 
risks  of  alternative  actions.   From  section  II  it  can  be  seen 
that  three  densi ti es  must  be  obtained  in  the  course  of  the 
procedure.   In  order  to  obtain  the  risk  of  immediate  decision 
without  sampling  the  prior  distribution  of  8,f(e)  must  be 
obtained.   To  obtain  the  risk  of  sampling  the  prior  distribu- 
tion of  x,  f(x_)  and  the  conditional  distribution  of  8  given  a 
sample  x,  f(e|x_)  must  be  obtained.   In  order  to  determine  the 
optimal  decision  after  sampling  f(e|x_)  is  required.   Thus  to 
implement  the  decision  procedure  three  densities  must  be 
obtained  at  each  step  in  the  process.   If  the  observation  and 
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production  process  can  be  modeled  by  the  generalized  linear 
model  of  (38)  and  (39)  the  required  densities  can  be  obtained 
as  follows  [4], 

Let:   f(9f)  °e  the  density  of  8  at  time  t  prior  to 
sampl ing 

f(et|x)  be  the  posterior  of  9  at  time  t  based 
on  the  sample  x_ 
f(x_.)  be  the  sample  distribution  prior  to  sampling 

Also  let  the  distribution  of  8  at  t-1  be  N ( u , Z ) 
Then  from  (38)  and  (39) 


f(9t)  -  N(A2y,A2ZA2   +  C«) 

f(x.)  -  N(A1A2u,  A1(A2ZA2T  +  C2)A]T  +  C]  ) 


(40) 
(41) 


f(e. |x.)  is  obtained  as  follows  (where  the  matrices  for  which 
inverses  are  needed  are  assumed  non-singular).   From  Bayes 
theorem  f(9t|x.)  a  f(x|Q+)  f(Qt) 

From  the  model  f(x.|9.)  ~  N(A-,9.,  C,) 


Thus  f(9t|x_.)  a  e 


-JsQ 


where  Q  =  (x  -  A -j  9  )T  C~  (x-A-jS  )  + 


1 


(e-A2u)  ( a2za2  +  c2)"' (e-A2v) 

the  t  subscripts  being  deleted  for  convenience,  collecting 
the  9  terms  yields: 
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Q  =  9T(A+A1TC1'1A1)9  -2(xTC1"1A1  +  uTA2TA)9 


+  X  C-,"  X  +  yTA2TAA2y 


where  A  =  (A2zA2T  +  C2)_1 
completing  the  square  results  in 


T  n-1 


TV  -1 


T„  T 


,T„T 


Q  =  (e^Dd)1  D'^e^Dd)  +  [X  C^X  +  y'A^AA^  -  d '  D '  d  ] 

Thus  f(et|xt)  ~  N(Dd,D)  (42) 

where  D'1  =  (A2zA2T  +  C^)"1  +  A1TC1"1A] 

d  =  A^C^x  +  (A2ZA2T  +  C2)_1A2u 

Thus  with  the  aid  of  the  generalized  linear  model  the  required 
densities  for  complex  multidimensional  production  processes 
can  be  evaluated  in  a  straightforward  manner  using  (40),  (41) 
and  (42). 

Equations  (40)and  (41)  represent  the  one  step  ahead 
predictive  distributions  of  e  and  x  based  on  the  prior  at  t-1 . 
As  discussed  in  section  II,  if  no  sampling  is  performed  at 
some  t  then  the  predictive  distribution  for  9  at  t+1  based  on 

the  prior  at  t-1  is  required.   In  general,  a  method  is  needed 

TH 
to  obtain  the  k   step  ahead  predictive  distributions  of  9 

and  x. 

TH 
Let  f(9t)  ~  N(y.,z.)  then  the  k   step  ahead  distribution 

f(9t  +  |J  "  N^t  +  k'  zt  +  k^  can  be  obtalned  recursively  from  the 
linear  model  as  follows: 
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yt  +  k  =  Vt  +  k-1   '   k  =  0J'2 


Et+k  =  A2zt+k-lA2   +  C2'   k-0,1,2,  ... 


TH 


(43) 
(44) 


Let  f(x.t+k)  "  N^mt+k'Ct  +  k^  be  the  k    step  ahead  predictive 
distribution  for  the  sample  x_t  +  k  t'ien  from  (43)  and  (44)  and 
the  linear  model,  m.+,   and  C.+.  can  be  determined  recursively 
by: 

mt+k  =  A^t  +  k   ,    k  =  0,1,2,  ...  (45) 


t+k  "  Alzt+kAl   +  Cl ,   k  =  0,1,2,  . . . 


(46) 


For  an  example  of  the  use  of  the  linear  model  and  the 
risk  calculations  the  reader  is  referred  to  Appendix  A.   As 
an  interesting  aside,  the  recursion  relationships  developed  in 
(42)  thru  (46)  are  identical  to  the  results  obtained  using 
Kalman  filtering. 
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V.   INCORPORATING  SUBJECTIVE  INFORMATION 

One  of  the  primary  advantages  of  the  Bayesian  decision 
procedure  is  its  capacity  to  incorporate  subjective  informa- 
tion into  the  decision  process.   Subjective  information  can 
be  incorporated  into  the  decision  process  by  either  of  two 
routes,  either  by  revisions  to  the  process  model  or  by 
altering  the  prior  distribution  of  9.   The  method  chosen 
depending  on  which  more  accurately  reflects  the  subjective 
information.   Examples  of  how  subjective  information  may  be 
used  will  be  discussed  with  respect  to  the  generalized  linear 
model  which  is  repeated  here  for  reference 

OBSERVATION:   x.t  =  A18t  +  et,   et  ~  N(0,  C] ) 


PROCESS 


t  =  Vt-1  +  V   wt  "  N(^'  C2) 


As  an  example  of  the  use  of  subjective  information,  suppose 
that  the  autoregressi ve  model  of  section  IV  is  being  used  to 
model  the  production  process  and  production  appears  to  be 
fairly  stable  (i.e.,  no  trends).   You  are  informed  that 
starting  with  the  next  production  lot  three  engineering 
changes  will  be  incorporated  into  the  units.   It  has  been  your 
experience  that  whenever  more  than  one  engineering  change  is 
incorporated  that  the  production  quality  is  momentarily 
reduced  and  then  increases  with  successive  lots  as  the  new 
procedures  are  learned  and  the  inspectors  gain  experience. 
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How  might  this  subjective  information  be  incorporated  into 
the  decision  process?   One  approach  would  be  to  adjust  the 
prfor  for  the  first  change  lot  by  lowering  the  mean  to 
reflect  the  anticipated  decrease  in  quality  and  increasing 
the  variance  to  reflect  the  associated  uncertainty  as  to  the 
actual  process  value.   Increasing  the  variance  will  have  the 
effect  of  weighting  the  new  sample  data  more  heavily  in 
determining  the  posterior.   After  adjusting  the  prior  to 
reflect  the  anticipated  decrease  in  quality  the  autoregres- 
sive  model  might  be  replaced  by  the  linear  trend  model  to 
reflect  the  anticipated  increase  of  quality  with  successive 
lots  as  a  result  of  the  learning  process.   The  rate  of 
increase  5.  in  the  linear  model  could  be  changed  for  each  lot 
to  provide  a  linear  approximation  to  the  anticipated  learning 
curve.   As  the  process  quality  returned  to  its  original 
level  the  autoregressi ve  model  would  again  be  used. 

This  example  illustrates  two  important  features  of  the 
Bayesian  decision  procedure  and  the  use  of  the  linear  model. 
First,  when  using  a  linear  model  it  is  not  necessary  that 
A-,,  A~,  C-,  and  C~  be  constant  for  all  t  only  that  they  be 
known  at  time  t  and  thus  the  parameters  of  the  linear  model 
are  free  to  change  as  required  by  the  process  being  modeled. 
The  second  feature  of  the  Bayesian  procedure  illustrated  in 
the  example  is  adaptability.   By  changing  the  model  structure 
or  the  prior  to  reflect  uncertainty  in  the  process,  the 
information  requirements  (sample  data)  of  the  procedure 
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adapt  to  reflect  these  changes.   In  order  to  maintain  the 
same  risk  more  samples  will  be  required  if  the  variance  of 
the  prior  is  increased  to  reflect  uncertainty.   Thus  unlike 
traditional  quality  control  procedures  where  the  same  sampling 
and  decision  procedure  is  used  the  Bayesian  method  can  adapt 
to  reflect  the  changing  requirements  of  the  production 
process.   As  another  example  of  adaptability  consider  the 
autoregressi ve  model.   Because  of  the  high  correlation  from 
one  lot  to  the  next  the  method  reduces  the  sampling  required 
when  quality  is  either  very  good  or  yery    poor,  thus  taking 
advantage  of  the  natural  excursions  of  the  output  quality. 
The  adaptability  feature  of  the  Bayesian  method  also 
Kas  another  interesting  property.   It  indicates  where,  when 
and  the  quantity  of  quality  control  resources  to  be  used. 
This  is  especially  important  when  trying  to  maximize  the 
effectiveness  of  the  quality  control  function  on  fixed  or 
limited  resources  (i.e.,  labor,  test  facilities,  etc.). 
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APPENDIX  A 
A  BAYESIAN  DECISION  EXAMPLE 

The  following  example  is  provided  to  illustrate  how  the 
Bayesian  method  is  applied  and  the  required  risks  are 
obtained.   It  is  assumed  that  an  auto  regressive  model  is  used 
to  represent  the  production  process  and  that  simultaneous 
sampling  is  used. 


Let:   OBSERVATION  MODEL:  x. 


t  +  et>  et  "  N(e'CTx  ) 


PROCESS  MODEL 


and  the  loss  function  is: 


t  =  p9t-l 


+  w 


wt  -  N(0,a2(l-p2)) 


L(e)d1  )  =  exp[-(e-90)] 


-   oo   <   H   <   co 


L(e,d2)  =  exp[-(e0-e)], 


-    CO    <    H    <    CO 


where  :  9Q  is  a  known  constant. 

Also  let  -F  ( e  t  _  -1  )  "  N  (vt_i  »<*  +  i  )  be  the  density  of  9  at  time 
t-1  . 

From  section  IV  the  correspondence  between  the  auto- 
regressive  model  and  the  generalized  linear  model  is: 
A1  =  1,  A2  =  p,  C1  =  ax2,  and  C2  =  <j2(l-p2).   Thus  by 
equation  (40)  and  (41)  making  the  above  substitutions: 
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f(6t)  ~  N(p]it-1,  P2at_-,2  +  (l-p2)a2) 
f(x)  ~  N(pvt_1>  P2at_-,2  +  (l-p2)a2  +  ax2) 
let  ut  =  pyt_-|  and  at  =  p  <?._-.   +  (1-p  )a 


then  f(Qt)  "  N(vt,at  ) 


f(x)  ~  N(yt,at2  +  ax2) 


(1) 
(2) 


To  determine  the  risk  of  making  an  immediate  decision  from 
equation  (10) 

2 
pU-j)  =  J  Ue.dj)  f(et)det  =  exp[-|-  -  (yt  -  eQ)] 


(3) 


and 


(d2)  =  J  L(e,d2)  f(et)det  =  exp[-£ 


+  (Ut  -90)]    (4) 


Equations  (3)  and  (4)  are  plotted  in  figure  A-l  as  a  function 
of  the  mean  y, . 

Since  p(d*),  the  risk  of  the  optimal  decision,  is  equal  to 
the  minimum  risk  from  figure  A-l  it  is  observed  that: 


(d*) 


J 


(dj 


)  p(dj 


Ut  <  90 

yt  >-   90 


(5) 


which  implies  the  following  decision  rule 


d*  = 


_  fd2  tf   »*t  <  9a 

]d]   if    yt  >  90 
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Figure    A-l .      Decision    Risk 
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By   symmetry   tfie    risk   of   decision    d*    is 

2 


t 

p(d*)    =   exp[-^ |u-e0|] 


In  order  to  determine  the  risk  of  sampling  and  then  making 
a  decision  d*  the  posterior  f(8t(|x.)  must  be  obtained.   The 
process  of  testing  a  sample  of  size  n  parameterized  by  a 
fixed  but  unknown  value  81  can  be  viewed  as: 


xi  =  9i  +  e.    e  -  N(0,ax  ) 


e  .    =   e  i  _  1    i  =  1  , 2  ,  .  . .  ,  n 


where  the  samples  are  assumed  independent.  Applying  the 

2 

linear  model:  A-,  =  1,  A~  =  1,  C,  =  a      ,  and  C2  -    0. 

By  repeated  application  of  equation  (40)one  obtains: 


f(et|x) 


2  x 


t   n 


2    ax 

at   +  — 


where:   x„  =  -  z   x. 
n    n      i 

i  =  l 


From  section  II  the  risk  of  obtaining  an  additional  sample 
and  then  choosing  d*  is: 


p(x)  =  f{inf   f p(e,d)f(e|x)de}  f(x)dx 


(6) 
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the  expression  in  brackets  is  p(d*|x_)  that  is,  the  risk  of 
d*  given  a  sample  result  x_.   p(d-,|x)  and  p(d2|x)  may  be 
obtained  by  substituting  f(e. |x)  for  f(st)  in  equations  (3) 
and  (4).   This  substitution  yields: 


1    2 


P(d1  |x)    =    expC^-ct      -    (3-0o)] 


1    2 


P(d2|x)    =   expect     +    (6-e0)] 


where 


2   CTx 


a    = 


2    .    ax 


8    = 


2  °x 

n      t  t    n 


2    .    ux 
at      +   T 


From   equation    (5) 


p(d*|x) 


p(d2|x)    ,      S    <    e0 
P(d] |x)    ,      e   >    eQ 


Equati  on  6  becomes : 

x_ 

P  ' 


(x)  =  /   P(d2|xn)  f(xn)  dxn  +  /  P(d]|xn)  f(xn)dxn 


(7) 


where:   xc  =  -* ^-  +  e0  and  f(X|1)  -  N(u,  *t  +  n   ) 


Solving  (7)  yields 


Vat  -v. 


Xc  -  <rt   -  y 


p(x)  =  pCd^  »/  \  u  .-...■  2   +  P(d2) 

w  +  ^J        WT^s 


(8) 
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where:      p(d-|)    and   p(d2)    are    as    defined    in    (3)    and    (4) 

*(•)    =    P(z<0    ,      zf  ~N(0,1) 
*(• )    =    1 -*(• ) 

Based  on  equations  (5)  and  (8)  the  usual  decision  rule  is 
applied.   If  p(d*)  >  p(x)  +  E[C(x)]  take  another  sample 

otherwise  make  decision  d*. 

TH 
To  determine  the  mean  and  variance  of  the  k    step  ahead 

2 
predictive  distribution  y.+,  and  a.+.  equations  (43)  and  (44) 

are  used  recursively  to  obtain  the  following  results. 

ut+k  =  Pkut  (9) 

at+k  =  p2kat2  +  °2n-P2k)  HO) 

TH 
The  predictive  distribution  for  k    step  ahead  sample 

parameterized  by  m.+.  and  C.+.  by  (45)  and  (46)  are: 

mt+k  =  ut+k 

2       2 
Ct+k  =  CTt+k  +  ax 

In  order  to  examine  the  behavior  of  the  posterior  of  9, 
f(e|x_)  as  the  sample  size  is  increased  observe  that 

2 

2  ax 


x„<r+     +   y 


lim   f(9t|x)    =    lim   N/-M ^~ 


at      + 


t  n 
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Thus  Itm  f(e  |x)  -»■  N(et,0) 
n-j-oo 

Since  x   is  a  consistent  estimator  of  e..   This  implies  that 

as  n  increases  the  knowledge  of  et  as  represented  by  f(e.|x_) 

becomes  perfect"  in  the  sense  that  the  variance  approaches 

zero  and  9.  is  then  known.   The  variance  decreases  approxi- 
t  ? 

1  CTx- 

mately  as  —  for  n  >>  — *- 

CTt 

TH 
The  k   step  ahead  prediction  distribution,  f(9t+i<)  °^ 

(9)  and  (10)  represents  the  uncertainty  in  8.+.  based  on 

information  up  to  and  including  time  t.   The  lim  f^t  +  i,)  "*" 

2 

N(0,a  )  which  is  the  distribution  of  the  process  before  any 

information  is  obtained.   Thus  as  k  increases  the  information 
obtained  at  time  t  loses  its  "value"  in  predicting  the 
location  of  the  process  at  t+k.   The  rate  at  which  previous 
knowledge  is  discounted  is  a  function  of  p  the  correlation 
between  9.  -,  and  9.  which  in  this  example  was  assumed 
cons tant  for  all  t . 
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